Applying Free Random Variables 
to Random Matrix Analysis of Financial Data 
Part I: A Gaussian Case 



Zdzislaw Burda,^'HAndrzej Jarosz,-Hj Jerzy Jurkiewicz/'[|| 
Maciej A. Nowak,Ml Gabor Papp.^'ff and Ismail Zahed^-Q 



'Marian Smoluchowski Institute of Physics and Mark Kac Complex Systems Research Centre, 
Jagiellonian University, Reymonta 4, 30-059 Krakow, Poland 

^Clico Ltd., Oleandry 2, 30-063 Krakow, Poland 

'^Institute for Physics, Eotvos University, 1518 Budapest, Hungary 

Department of Physics and Astronomy, SUNY Stony Brook, NY 11794, USA 

(Dated: January 18, 2010) 

We apply the concept of free random variables to doubly correlated (Gaussian) Wishart ran- 
dom matrix models, appearing for example in a multivariate analysis of financial time series, and 
displaying both inter-asset cross-covariances and temporal auto-covariances. We give a compre- 
hensive introduction to the rich financial reality behind such models. We explain in an elementary 
way the main techniques of the free random variables calculus, with a view to promote them in the 
quantitative finance community. We apply our findings to tackle several financially relevant prob- 
lems, such as of an universe of assets displaying exponentially decaying temporal covariances, or the 
exponentially weighted moving average, both with an arbitrary structure of cross-covariances. 
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A. Financial Cross— Correlations and Auto— Correlations From Gaussian Random Matrix Theory 



Cross-correlations between assets traded on the markets play a critical role in the practice of modern-day financial 
institutions. For example, correlated moves of assets diminish a possibility of optimal diversification of investment 
portfolios, and precise knowledge of these correlations is fundamental for optimal capital allocation: in the classical 
Markowitz mean-variance optimization theory all the correlations must be perfectly known. Not only so, but a 
future forecast of the correlations would be highly desirable. However, the information about cross-correlations and 
their temporal dynamics is typically inferred from historical data, stored in the memories of computers (usually in 
the form of large matrices); the material decoded from past time series is inevitably marred by measurement noise, 
and it is a constant challenge to unravel signal from noise. 



I. INTRODUCTION 



1. Financial Correlations and Portfolio Optimization 
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One may argue that from the point of view of practical portfoho optimization, cross-correlations are only a 
"second-order effect," since they represent fluctuations around a certain trend described by the average returns of 
assets. Now it is well-known that the future returns are to a great extent impossible to determine from the historical 
returns (one would need to take a very long time series to account for volatilities, which would manifest a highly 
non-stationary nature of the returns' distribution), and are thus subject to individual assessment by managers. We 
will disregard this problem in the present analysis, and assume that investors have some expectations of the future 
returns, recalling at the same time that volatilities and cross-correlations should be much more stable in time (due to 
the phenomenon of heteroscedasticity, i.e., time-dependence of volatility), hence allowing some level of predictability 
of their future values from historical data, once the measurement noise has been properly cleaned. 

Another reservation (see for example [2j]) may be that optimization leads to a solution for the portfolio weights 
dependent on the covariance matrix in a stable way (in the sense that a small modification of the covariances leads to 
only a small change in the portfolio composition and risk) only in simplest cases of linear constraints on the weights, 
while for non-linear constrains (present for example on futures markets) the problem becomes equivalent to finding 
the energy minima of a spin glass system, and there is an exponentially large (in the number of assets) number of these 
local minima with an unstable (chaotic) dependence on the covariance matrix. Again, we will restrict our interest 
to portfolios with linear constraints on the weights, in which case the estimation noise of the covariance matrix is 
already troublesome enough for special techniques of its elimination to be unavoidable. 

Keeping these limitations in mind, we conclude that correlations between assets necessarily have to be included 
in any risk analysis (portfolio optimization, option pricing). Even more importantly than investment purposes, one 
should investigate the covariance matrix out of a theoretical motivation, as its structure mirrors the interdependencies 
of companies, as well as possesses a non-trivial temporal dynamics; both qualities crucial for better understanding of 
the underlying mechanisms governing the behavior of financial markets. 



2. The Gaussian Approxtmation 



The primary way to describe not only the volatilities but also cross-correlations in a universe of some N assets is 
through the two-point covariance function, 

Ciajb = {RiaRjb) ■ (1) 

We define r^a = log Si^a+i — log Sia ~ {Si^a+i — Sia)/ Sia to be the return of an asset i = 1, . . . , N over a time interval 
a = 1, . . . ,T, i.e., (approximately) the relative change of the asset's price Sia between the moments of time aSt and 
(a + l)5t, where 6t is some elementary time step. (We disregard the known small "leverage effect" of anomalous 
skewness in the distribution of the returns: that the price increments, rather than the returns, behave as additive 
random variables.) Moreover, we denote Ria = — (ria), which describe the fluctuations (with zero mean) of the 
returns around the trend, and collect them into a rectangular N x T matrix R. The average (. . .) is understood as 
taken according to some probability distribution whose functional shape is stable over time, but whose parameters 
may be time-dependent. 

In this paper, we will employ a very simplified form of the two-point covariance function ([T]), namely with cross- 
covariances and auto-covariances factorized and non-random, 

^ia,jb — CijA^ab (2) 

(we will collect these coefficients into an x cross-covariance matrix C and a T x T auto-covariance matrix A; 
both are taken symmetric and positive-definite). We will discover that the matrix of "temporal covariances" A is a 
way to model two temporal effects: the (weak, short-memory) lagged correlations between the returns (see par. II C it . 
as well as the (stronger, long-memory) lagged correlations between the volatilities (heteroscedasticity; par. II C 2|) . On 
the other hand, the matrix of cross-covariances ("spatial covariances," using a more physical language) C models the 
hidden factors affecting the assets, thereby reflecting the structure of mutual dependencies of the market companies 
fpar. lIB 2t . Importantly, both contributions are decoupled: the temporal dependence of the distribution of each asset 
is the same, and the structure of cross-correlations does not evolve in time; this is quite a crude approximation. Also, 
these are all fixed numbers; only in a subsequent work do we plan to explore another known way (a "random parametric 
deformation") of modeling temporal dependence of cross-covariances, that is, considering C to be a random matrix 
of a given probability distribution. 
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For our approach to be valid, both covariance matrices obviously must be finite. In fact, the current article deals 
exclusively with the multivariate Gaussian distribution for the assets' returns which displays the two-point covariances 



1 / 1 ^ ^ \ 

— ^-P - 2 E E R,, [A-i] DR 
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= —. — exp — TrR'^C-iRA-i DR, (3) 
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where the normalization constant A/'c.c. = (27r)^-'"/^(DetC)-'"/^(DetA)^/^, and the integration measure 
DR = ^ di?ia ; the letters "c.G." stand for "correlated Gaussian," and the expectation map w.r.t. this dis- 
tribution will be denoted by (. . .)c.g.) while """"" denotes matrix transposition. In this case, the covariance matrices 
are finite, and moreover they are sufficient to fully characterize the dependencies of the RiaS. However, in more 
realistic situations, such as of financially relevant distributions having heavy power-law tails (with an exponent /z), 
the two-point covariance (in particular C) may not exist. The precise answer boils down to how these correlated 
non-Gaussian distributions are defined (one can exploit a variety of methods: a linear model, a copula, a random 
deformation of some kind, a method based on freeness, on radial measures, etc.; see for example [3|, sections 9.2, 
12.2.3, 12.2.4, and and we will postpone this discussion to forthcoming communications. Let us just mention 

that in some cases (such as a linear model of Levy stable variables, i.e., with /i < 2), C diverges, and another 
measure of covariance should be devised (for example, the "tail covariance," which quantifies the amplitude of the 
tail and asymmetry of the power-law product variable RiaRjb)', even for fj, > 2, when C exists, it is informative 
(especially from the point of view of portfolio optimization in the sense of minimizing value-at-risk) to use the tail 
covariance, as it focuses on large negative events. However, we henceforth restrict our attention to only the correlated 
Gaussian distribution ([31); on this simplest (and admittedly quite distant from reality) example we wish to advocate 
our approach, with a view to further generalize it to heavy-tailed distributions. 



3. Free Random Variables 



A decade ago, Bouchaud et al. and Stanley et al. (7|, |8[ suggested the use of Gaussian random matrix theory 
(RMT) for addressing the issue of noise in financial correlation matrices. Since then, a number of results regarding 
the quantification of noise in financial covariances have been derived using this tool [9|-|25|, some of which maybe of 
relevance to risk management. 

In this work, we would like to advertise the concepts of the free random variables (FRV) calculus as a powerful 
alternative to standard random matrix theory, both for Gaussian and non-Gaussian noise. FRV may be thought of as 
an abstract non-commutative generalization of the classical (commutative) probability calculus, i.e., a mathematical 
framework for dealing with random variables which do not commute, examples of which are random matrices. (Indeed, 
FRV was initiated by Voiculescu et al. and Speicher ^2^, H^l as a rather abstract approach to von Neumann algebras, 
but it has a concrete realization in the context of RMT, since large random matrices can be regarded as free random 
variables.) Its centerpiece is a mathematical construction of the notion of freeness, which is a non-commutative 
counterpart of classical independence of random variables. As such, it allows for extending many classical results 
founded upon the properties of independence into the non-commutative (random matrix) realm, particularly the 
algorithms of addition and multiplication of random variables, or the ideas of stability, infinite divisibility, etc. 
This introduces a new quality into RMT, which simplifies, both conceptually and technically, many random matrix 
calculations, especially in the macroscopic limit (the bulk limit, i.e., random matrices of infinite size), which is of 
main interest in practical problems. 

Several years ago, we suggested that FRV is very useful for addressing a much larger class of noise (Levy) in the 
context of financial covariance matrices in a way that is succinct and mostly algebraic ^28rf33j . These results have 
now seen further applications to financial covariances Is^-IsgI and macroeconomy [s^]- Also, FRV has been already 
applied to a number of problems ranging from physics j38l - l4Cl| | to wireless telecommunication [4l| - |4^ . 

The primary aim of this publication is to advertise the framework of FRV to the audience of quantitative finance 
(QF) by re-deriving several known results obtained earlier through other (more laborious) methods of Gaussian RMT, 
as well as solving a few problems for the first time. This illustrates the fluency of the FRV calculus for noisy financial 
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covariances. In particular, we show how an FRV-based back-of-an-envelope calculation leads to a simple equation 
([6T|) for a function AI = AIc{z) ((32|) that generates all the moments {i.e., contains all the spectral information) of a 
historical estimator c of the cross-covariance matrix C ([2]) , in the presence of arbitrary "true" cross-covariance 
and auto-covariance matrices C and A, 

z = rMNx{rM)Nc{M). (4) 

Here the A^'s are certain functions (p4)) . computable once C and A are known; and r = N/T. (The unrealistic 
assumption about the Gaussian statistics of the financial assets' returns will be relaxed only in subsequent papers, 
where generalizations of the current findings, among them to the Levy FRV calculus will be presented; although 
randomly sampled Levy matrices have one or even no finite spectral moments, FRV permits a straightforward analysis 
of the pertinent moments' generating functions and thus the corresponding spectral distributions.) 

This article is organized as follows: 

• The remainder of this section |I] is devoted to discussing the motivations, meaning, and applicability of our results 
to the financial reality. Our working assumption of Gaussianity, its limitations and possible extensions, have 
already been touched upon in par. II A 21 In subsec. II Bl various commonly used models for the cross-covariance 
matrix C are presented, which may be used as an input for equation Q. This application is however limited 
by the appearance of large eigenvalues in the spectrum of C, for which we give a brief justification; it calls for a 
more refined analysis than currently allowed by FRV. Subsec. II CI deals with auto-covariances A. We show that 
our method is well-poised to investigate the weak short-ranged auto-correlations observed on financial markets, 
triggered by non-zero transaction costs and the presence of bonds or interest rates, and included in modern 
risk evaluation methodologies, but fails at this stage to handle more involved auto-covariances between different 
assets, such as required for example to explain the Epps effect. We describe also the much more important long- 
memory auto-correlations between the random volatility of the returns (heteroscedasticity) , and introduce some 
corresponding weighting schemes, mainly the exponentially weighted moving average (EWMA). In subsec. II Dl 
we define and discuss several standard historical estimators of the covariance matrices (Pearson, time-lagged, 
weighted) . 

• Section HI] is the centerpiece of this work. It commences in subsec. Ill AI with a crash course in free random 
variables, with a particular focus on the addition and multiplication algorithms of free random matrices; the 
latter constitutes the chief tool we exploit. It has a more operational flavor, designed to aid practical applications 
by the QF community rather than to delve into the mathematics behind the scenes. Subsec. IIIBI gives a 
foretaste of the power of the FRV approach by re-computing in an algebraic way the famous Marcenko-Pastur 
distribution. The salient part comes in subsec. FlI CI where equation along with its variations, is derived and 
presented. 

• Section inil contains two examples of how the main formula ^ can be used to tackle financially relevant problems. 
Namely, we find equations for the moments' generating functions M of the standard and time-delayed historical 
estimators in the presence of exponentially decaying temporal auto-covariances and arbitrary cross-covariances 
(subsec. nil A[) . as well as of the EWMA with arbitrary cross-covariances (subsec. IIII B[) . In the simplest case 
of no underlying cross-covariances, we reinforce our analytical findings with numerical simulations. 

• The article terminates with short conclusions and some possible prospects for the future in section ITVl (more are 
actually given inside the body of the article), as well as a list of references. 



B. Modeling Cross— Correlations 

1. Principal Component Analysis 

In this subsection, we consider a fixed moment in time, a, and investigate the cross-correlations between the 
assets; assuming the decoupling ([2]), their structure C does not depend on time. Being symmetric and positive- 
definite, C can be diagonalized, Cv^ = AfcVfc, with N real and positive eigenvalues (they can be ordered decreasing, 
Ai > . . . > Aat > 0), and N orthogonal and normalized eigenvectors. This diagonalization is called a "principal com- 
ponent analysis" (PGA), because the knowledge of the eigenvectors of C allows to linearly transform the N correlated 
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entities Ria into N uncorrelatcd ones (referred to as "principal components" or "explicative factors") Cka, whose 
variances are given by the eigenvalues of C, 

N N 

eka = ^VkARia, or conversely, Ria ^^Vk,ieka, where (ekaeia) ^ >^kSki- (5) 

i=l fe=l 

In other words, the PCA unravels the uncorrelated (not necessarily independent) factors affecting the collection of 
assets, and orders them w.r.t. their decreasing volatility. Since a factor is a certain mix of the assets (i.e., a portfolio), 
we can restate it yet differently by that the PCA derives a set of uncorrelated investment portfolios, and orders them 
w.r.t. their decreasing risk. 



2. Factor Models 



There have been put forth models of the structure of the covariance matrix (see for example Q, section 9.3). They 
are to reflect the structure of the spectra of its estimators built from historical financial data (see subsec. ITD)) . which 
typically consist of one very large eigenvalue, several smaller but still large ones, and a sea of small eigenvalues, whose 
distribution can be very accurately fitted with the Marcenko-Pastur distribution [45| of the eigenvalues of a purely 
random matrix belonging to the (uncorrelated) Wishart ensemble (46j . As we discuss in more detail in subsec. II Dl 
an estimator of the covariance matrix will necessarily contain a lot of measurement noise, and these decade-old 
results 0, HI, pioneering the use of random matrix theory in financial applications, suggest that actually most of the 
spectrum is purely random, with the exception of the largest eigenvalues "leaking out" of the bulk Marcenko-Pastur 
distribution, which do carry some information about the true correlations between the assets. For example, the 
appearance of one very large eigenvalue Ai (c.a. 25 times greater than the upper bound of the noise distribution for 
the S&P500 data in Q) has the following meaning: Since ei fluctuates with such a dominating volatility, the PCA ([5]) 
can be approximated as Ri ~ wi,iei (we skip the index a in this paragraph), which means that the dynamics of all the 
assets is governed practically by just one factor. The corresponding eigenvector has roughly all the components equal, 
which thus represents a portfolio with approximately the same allocation in all the assets, i.e., with no diversification; 
this portfolio will therefore be strongly correlated with the market index, and is thus called the "market factor." Its 
presence in the empirical spectrum may be understood for instance in terms of the herding phenomenon (a collective 
behavior of the investors). Similarly, other large eigenvalues seen in the spectra of historical covariance matrices 
can be attributed to the clustering of individual companies into industrial sectors (constructed by investigating the 
relevant eigenvectors), within which the correlations are strong. 

Consequently, one may attempt to model the matrix C in order to reproduce such empirical observations; this 
program goes under a name of "factor component analysis" (FCA), since it aims at describing a large number of 
cross-correlations between assets in terms of their correlations with a much smaller number of factors. To begin with, 
one considers a "one-factor model" ("market model") [IJ, where it is assumed that each return Ri is impacted by 
the market return 0o with some strength (3i (named the "market beta" of this asset), and besides that there is no 
correlation between assets; namely, it approximates Ri — l3i(j)Q + Ci, where 0o a-nd the e^'s (called the "idiosyncratic 
noise" ; their presence implies that the underlying factors cannot be directly observed as they are marred by random 
errors) are uncorrelated and have the volatilities E and Ci respectively; the model is described by (2A^ + 1) parameters. 
The covariance matrix thus reads 

a, =E^P,(3j+a^5,,. (6) 

It can be easily diagonalized under an additional simplification that all the cxi's are equal to some ag, in which case 
there is one large (oc N; we generically assume N to be large, see ([H])) eigenvalue Ai = + o'o, corresponding 

to vi cx /3 (the "market"), and an {N — l)-degenerate eigenvalue ctq (if the CTj's are unequal but of comparable size, 
there is a large market eigenvalue and a sea of (A*" — 1) small ones). 

A more refined "multi-factor model" [31313: describing increased correlations within industrial sectors, as- 
sumes that the idiosyncratic (non-market) parts Ci are exposed to K hidden factors (j)a, namely Ci = Pia'Pa + 
where all the factors and the new idiosyncratic terms are uncorrelated and have the volatilities and Gi respectively. 
In this case, 

K 

C,j = + ^« + oj5,j . (7) 

a=l 
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To further (quite drastically) simplify this model, we may consider that each asset i is exposed to only one factor 
a, with Na assets belonging to sector a iJ2a=i = N), which translates into (3ia = f3(^a'i)a = ^a'aPj"' , where the 
index i is split into a double-index (a'i), with a' enumerating sectors and i assets within sector a' . Also we consider 
the exposures to the market negligible as compared to the industrial ones = 0), and the idiosyncratic volatilities 
depending only on the sector, ai = cr(Q,i) = ctq. Then C acquires a block-diagonal form, which is easily diagonalized 
to give K "large" eigenvalues Xa = S^/S^"-*^ + tr^ plus K "small" {Na — l)-degenerate eigenvalues cr^. It mirrors 
more properly the existence of several large eigenvalues. The form of the covariance matrix can also b e sp ecified along 
analogous lines in more complex ways, such as in the "hierarchically nested factor model" (HNFM) I50|. 

Any such model can be used as a non-statistical input for equation ([4]) . In this way, the number of parameters 
to be estimated (for which the Pearson's chi-square method may be used) is typically greatly reduced, however on 
the cost of a specification error. Another problem with the above models is that their generic feature is the existence 
of isolated "large" eigenvalues {i.e., proportional to the size of the portfolio N and with a microscopic degeneracy, 
typically singlets), while the FRV tools seem not adequate enough to tackle such situations yet, being restricted to 
the bulk of the distribution, and thus other methods should be employed (see for example |5l|). This is why in this 
article we refrain from using our main formula (0]) for any nontrivial cross-covariance matrix C, focusing rather on 
temporal covariances; this obstacle should certainly be dealt with. 



C. Modeling Auto— Correlations 



1. Lagged Correlations Between the Returns 



Let US now discuss which empirical facts concerning temporal correlations can be modeled (and how) within 
our very simplified framework First, it is well-known that the returns are weakly auto-correlated on short 
time scales: the delayed correlation function (see for its definition) is significantly different from zero (and, for 
example, negative for stocks, but positive for stock indices) for the time lags less than c.a. 30 minutes for liquid and 
free-floating assets (longer on less liquid markets; this decay lag also decreases with time), see sections 6.2, 13.1.3. 
The simplest and most natural model for such a behavior, for a single asset i, is an exponential decay. 



^ ^-\b~a\/T 



with the characteristic time r (given here in the units of St) of the order of several minutes. 



(8) 



Let us emphasize that we are talking about correlation functions here, i.e., normalized by the variance. The 
variance have completely different, stronger, long-memory temporal dynamics (heteroscedasticity; see par. II C 2p . 
allowing forecasts of future volatilities from past data. From this point of view, heteroscedasticity is a "first-order 
correction" to an iid of the returns, while the auto-correlation of the returns (such as ([5])) is a "second-order effect." 
Consequently, any long-term forecast of the mean returns seems impossible, and we will focus on forecasting the 
volatility, used then to assess the short-term risk of a portfolio. Anyway, on short time horizons (such as one business 
day), the mean return is negligible (say, a small fraction of a percent for stocks) as compared to its volatility (a few 
percent). 

These weak auto-correlations should not however be disregarded. Actually, they may persist on longer time 
scales, such as days, but to reveal that, one would need to consider much longer (decades) historical time series in 
order to decrease the estimation noise; there might even be auto-correlations over time spans of a few years, refiecting 
the existence of economic cycles. If detectable auto-correlations were present for longer lags, they could be used to 
devise a profitable trading strategy until arbitrage would remove them; this is the efficient market hypothesis. But 
even such weak and short-memory auto-correlations allow in principle to attain large profits in high-frequency (HF) 
trading; however, when transaction costs are taken into account, these profits are precisely discounted, and this is 
one reason that a non-zero decay lag is allowed without contradicting the efficiency of the markets. Another reason 
is the existence of riskless assets (bonds), which implies that stocks should gain on average the riskless rate of return 
and additionally a risk premium; moreover, short-term interest rates are not free-floating (they are set by the central 
banks and are usually quite predictable). For these reasons, the auto-correlations, albeit weak, begin to be included 
into new risk evaluation methodologies, such as RiskMetrics 2006 [s^, which even attempts to predict the mean 
returns (and thus risks) for long-time horizons, up to one year. Our approximation ^ should be able to handle 
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effects like ([S]), and in subsec UlI Al we indeed show an application of equation to a model with an exponentially 
decaying auto-covariance matrix A. 

However, this phenomenon of non-zero lagged correlations should be extended from a single asset to multiple 
variables: the returns of different assets are correlated between different time moments. This is crucial, for example, 
for gaining insight into the dynamics of the cross-correlations as we progress from the HF time scale to longer time 
scales. The strength of the equal-time cross-correlation between assets i ^ j was long ago observed to grow with 
increasing 5t (when moving from higher to lower sampling frequencies, the equal-time inter-asset correlations rapidly 
rise on the scales of several minutes, to saturate on the scales of days), which is called the "Epps effect" [53j. Not only 
this, but the entire structure of cross-correlations (depicted through the maximum spanning tree of the market jsij ) 
evolves as an embryo which expands and differentiates as 5t enlarges. This change of strength and structure of cross- 
correlations is, for instance, critical for the possibility of increasing the sampling frequency in order to obtain longer 
time series, and as a result, less noisy historical estimators (see par. IIP ip : one cannot probe too deep into the HF 
regime (far from the saturation level of the Epps curve) since then the entire tree of cross-correlations looks totally 
different. This is yet another reason for developing noise-cleaning procedures, such as the one advocated in this paper. 
In order to explain the Epps effect, a causal relation must be present between the time evolution of the return of asset 
i at a certain time and the returns of all the other assets j 7^ i at all the previous moments. For example, in (55l. [56|. 
the equal-time cross-correlations are expressed through delayed cross-correlations over shorter time scales, and the 
latter are modeled by a direct analog of the exponential decay ([S]), just with separate i and j, 

{RiaRja} 

this eventually proves to provide an analytical shape of the Epps curve which is remarkably close to the experimental 
one. Another, more complex model of linear causal influence is presented in [ssj . 

n{t) = e,{t) + ^ / dt'K,, {t - t') T, (<') , (10) 

i=i 

where ei(i) is an idiosyncratic part, and Kij{t — t') is called the "influence kernel." However, these models cannot 
be captured by our current simple framework ([2]), and it certainly is an interesting challenge to extend the FRV 
approach so that it could be helpful in an analytical treatment of such more involved correlations, responsible among 
other things for the Epps effect. 



2. Heteroscedasticity 



A well-established "stylized fact" observed in all financial time series is that the (say, daily) volatility of an asset's 
return depends on time, displaying a "long memory," namely that periods of high or low volatility tend to persist 
over time; this property is known as "heteroscedasticity," "volatility clustering," or "intermittence" (by analogy with 
turbulent flows of fluids, where a similar phenomenon of persistent intertwined periods of laminar and turbulent 
behavior occurs). A standard approach to describe mathematically this experimental fact is to suppose that not only 
is the demeaned and normalized return e^a (the "residual") a random variable, but so is the volatility iTia, 

Ria = CTiaCia- (H) 

These two sources of randomness are to a great extent inseparable, and it becomes a matter of choice how to model 
them in order to jointly arrive at results which comply with empirical data; for example, a Student distribution for 
the return can originate from an inverse-gamma randomness of the variance superimposed on a Gaussian iid of the 
residuals. Such a very general depiction (jlip . with the eia's assumed to be iid with zero mean and unit volatility and 
the CTia's some random variables possibly correlated with each other and also with the residuals, is named a "stochastic 
volatility model" ; see Q , section 7. 

Indeed, the volatilities at different time moments are correlated. These lagged correlations are not strong (several 
percent, depending on the volatility proxy used and the time lag chosen), but stretch over much longer periods than 
the lagged correlations of the residuals, discussed in par. II C II their slow decay (long-memory) can be modeled well 
by a power law, l/\b — aj", where a fit of the exponent v typically lies in the range 0.2-^0.4 (depending on the domain 
of the time lags considered). In other words, the temporal dynamics of the volatility is a multi-scale phenomenon. 
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Moreover, the probability distribution of (an estimator, such as the "high-frequency proxy," being the daily average 
of HF returns, of) the volatility may be approximated by an inverse-gamma or a log-normal shape. 

A basic idea, founded upon the presence of the long-memory lagged volatility correlations, is to regard the 
volatility as undergoing a certain stochastic process. A convenient feature of this approach is its consistency: the 
volatility process should be constructed from historical time series (in particular, it should reflect a power-law-like 
decay of the lagged correlations), and thus obtained parameters are then used to make forecasts (through evaluating 
conditional averages) for the value of the volatility over some future time horizon At. In other words, even long- 
time horizons become available for meaningful risk assessment; although of course for long At the deficiency of past 
data excludes a possibility of backtesting of these forecasts. There exists a plethora of propositions for volatility 
processes. One selecting requirement is actually computational accessibility of forecasting, which practically reduces 
the possible choices to quadratic processes only, i.e., where the variance depends linearly on the past squared returns, 
see below. This still yields a very broad class of processes, falling under the name of "auto-regressive conditional 
heteroscedasticity" (ARCH) . 

A standard textbook example, reflecting to some extent the behavior of financial time series, is the GARCH(1, 1) 
model US-Hi, 

al = WooCTmoan + (1 " ''"oo) Crhist.,a^ CrLt.,a = "Crhist..Q-l + (1 - ")^a-l (12) 

(the asset index i is skipped here and in the remainder of this paragraph). In the above, cr^oan the unconditional 
mean variance, representing the average long-run value of the variance, and crj^j^j ^ is a "historical (auto-regressive) 
variance," depending linearly on both itself and the realized squared return at the preceding time moment (the daily 
frequency is typically used). The constant Woo is a "coupling," while a G [0, 1] may be thought of as measuring the 
responsiveness of the variance to the recent realized variance Ra-i- ot close to 1 means that the variance responds 
quite slowly to the news. (We may also write ^ differently, al = cr,^oan + 5i(o-a-i " CTmoan) + 52(i?a-i ^ crLi), 
where 171 = 1 — Woo(l — a) and 52 = (1 — Woo){^ — oi), in order to see that the process tries to revert the volatility 
to its mean value with strength gi , and also incorporates a feedback of the difference between the realized squared 
return R^^i and its mean value (J^_i on the next-day value of the variance, to which effect a magnitude 32 is given.) 

There are two problems with this traditional model. First, it is an "affine" ("mean-reverting") process, i.e., 
containing the additive term ifooCmoam whose meaning is that in the long run, the average volatility will converge 
to the value CTmean regardlcss of the initial conditions. To every affine process, there exists a corresponding "linear" 
("integrated") process, which simply removes the unconditional expectation by setting Wao = 0, in which case the 
long-term mean volatility depends on the initial conditions or may even not converge; for example, (|12p will yield a 
model called I-GARCII(l). Only the integrated processes can be successfully harnessed for risk forecasts. The reason 
is that an integrated model is described by two parameters less than its mean-reverting counterpart (namely, Wao and 
f moan) 7 and the latter of these is strongly time series dependent; in other words, for a portfolio of N assets, an affine 
model would contribute a large number N of mean unconditional volatilities for estimation, which would produce a 
huge measurement error. In I-GARCH(l), on the other hand, there is only one parameter a accounting for all the 
assets that requires estimation. 

More importantly, both GARCH(1, 1) and I-GARCH(l) (let us henceforth focus on the integrated versions only) 
fail to reproduce the observed power-law decay of the time-lagged volatility correlations, leading instead (the cal- 
culation is doable analytically) to an exponential decay, with the characteristic time (in the units of 5t ~ one day) 
T = -l/loga, 

{-l<rl) {al) {al) (x e'l^-l/^ (13) 

This may be seen in yet another way by unwinding the second part of (I12p . thus casting the variance as a linear 
function of the past squared returns, 

where a necessary cutoff T is introduced. The "weights" with which the past squared returns impact the today's 
variance, scale exponentially as we move backward in time (cx cfi~2i)\ this short-memory scheme is called the "ex- 
ponentially weighted moving average" (EWMA), see for example [i^, chapter 21, and [6l|, [g^]. Despite this evident 
shortcoming, the I-GARCII(l) (EWMA) volatility process has very successfully transpired into the every-day prac- 
tice of many financial institution by being woven into the commonly accepted risk evaluation methodology, RiskMet- 
rics 1994 [63, Ig^]- It was probably due to its simplicity, as it is described by just one parameter a shared by a wide 
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range of securities (its value found to yield forecasts which come closest to the realized variance is a = 0.94, i.e., 
T = 16.2 business days), which is critical since the methodology is to be applied to a great many time series; and 
moreover, it utilizes only the one previous-day observation to update the volatility (so little data needs to be stored). 

Let us briefly mention that the framework of integrated models can be extended to accommodate for the long- 
memory correlations [gl, H^, culminating in the fresh RiskMetrics 2006 [52|. Such processes are still quadratic. 



al=Y,wtRl^„ (15) 



6=1 



where the weights Wb are positive and obey the "sum rule," Y^^=i Wb = ^- For example, RiskMetrics 2006 argues that 
a logarithmic decay consistently proves to be an even better fit to financial data than a power law, 

Wb(xl , (16) 

log To 

where again one parameter tq ^ 3 6 years is enough to capture the long memory of diverse time series. (For 
small time lags, the power-law and log-decay are very similar, with their parameters related approximately through 
v = 1 / \og{TQ / St) . But for longer lags, say beyond one month, the logarithmic fit visibly stands out in quality.) 

Our FRV technique is not yet suited for handling ARCH models like discussed above. However, the FRV calculus 
does bring considerable simplification into working with historical estimators of cross-covariance matrices which 
incorporate weighting schemes (fTS)) . such as the EWMA (|T4)) or log-decay p6|) . They will be defined in par. IIP 3l 
and the case of the EWMA (with all its defects and advantages just highlighted) will be addressed in subsec. IIII Bl 



D. Historical Estimation of the Covariance Matrices 



1. Estimators of Equal-Time Cross-Covanances 



A fundamental problem is how to reliably estimate the covariance matrices from the available historical data |67l . 
Issj . One obstacle lies in the finiteness of the time series, due to which any estimator will contain an amount of 
measurement noise. Let us focus for definiteness on estimating C: since there are N{N + l)/2 independent entries 
in C, and we have at our disposal N time series of length T each (collected in a historical realization R; we will not 
distinguish in notation between random variables and their actual realizations), thus the level of the estimation noise 
may be quantified by the "rectangularity ratio" 

r^^. (17) 

If r — )• (thanks to T — > oo with fixed N, which is a limit commonly used in statistics), any empirical covariance 
matrix should approach the exact one {i.e., it should be asymptotically unbiased). However, r close to zero is usually 
far from financial reality, in which both T and N are large and of comparable size; for example, one may have daily 
data from several years (each of about 260 business days) and may want to consider a major bank's portfolio consisting 
of several hundred of even thousands of assets; hence, the relevant regime is rather the "thermodynamical limit," 

N ^ oo, r — >• oo, such that r = fixed. (18) 

Therefore, any estimator will be (seriously, for realistic values of r) dressed with the measurement noise, and it is of 
paramount importance (from the point of view of risk management, for example) to devise methods which detect in 
the noised estimators information about the true covariances ("cleaning" of the measurement errors); these de-noised 
estimators can then serve for practical purposes (such as evaluating the risk of a portfolio) . (Remark that is also 
precisely the limit in which the standard techniques of random matrix theory are applicable to the random matrix R; 
to be used below.) 

The matrices C and A can be estimated from the past time series R by, for example, 

c = ^RR'^, a EE — R'^R, (19) 
T N 
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which may be called their Pearson estimators (the usual prefactors l/(r — 1) and 1/{N — 1) are replaced in the above 
by 1/T and 1/A'^, respectively, since we can approximately disregard the average value of the returns in comparison 
with their volatilities over the considered time horizons; see par. II C T|) . For any probability distribution of the returns 
such that ^ holds with finite C and A, it is easily checked that the Pearson estimators (IT^ are, up to rescalings, 
unbiased, 

(c) = AfA.iC, (a) = McaA, (20) 

where Aic,i = ;^TrC and A^a,! = yTrA are the first moments of the matrices C and A, see below ([30|. For the 
correlated Gaussian distribution the Pearson estimators are proportional to the respective maximum likelihood 
estimators. 

It is clear that Cij — y^^^i RiaRja represents the equal-time covariance between assets i and j averaged over 
time; similarly, = 2i=i RiaRib shows how the measurements at moments a and b are correlated on average for 
all the assets. These two seemingly very different quantities are in fact very closely related: since RR"^ and R"^R have 
the same non-zero eigenvalues (the larger one has additionally \T — N\ zero modes), thus c and a have also identical 
non-zero eigenvalues up to the factor of r (the latter are l/r times the former). In other words, the information 
content of c and a is equivalent, describing the structure of equal-time correlations between the assets; we will thus 
abandon a henceforth. We will state this point in more quantitative terms (|47|) in par. IIIB 21 

As is well-known, it is possible to describe the N xT correlated Gaussian random variables R in terms of TV x T 
uncorrelated Gaussian variables R; this is achieved through the change R = -s/CR-a/A (the covariance matrices are 
symmetric and positive-definite, therefore their square roots exist), which transforms the correlated Gaussian measure 
^ into the uncorrelated one. 



Pg. (R)DR = exp (^-iTrRTR^ ^ '""P ^ S ^^'^ 



DR, (21) 



with A/q. = (27r)^"^/^, and (. . .)g. denoting the expectation map w.r.t. this probability distribution. Correspondingly, 
the estimator c becomes in the new language more involved, 

c = ^VCRAR^VC. (22) 

With R a random matrix drawn from the distribution (HI]), the estimator c ([^^ is called a "doubly correlated 
Wishart" random matrix. 



2. Estimators of Time-Delayed Cross-Covariances 



It is of course desirable to find an estimator of temporal correlations, i.e., correlations between two assets at two 
different moments in time. It is commonly done through the "lagged covariance matrix estimator," which represents 
non-equal-time (with a time lag d, an integer divisor of the total time series length T, t = T/d =2,3,...) covariance 
between assets i and j averaged over time, 

T-d 

clf = - E = -RD^'^^RT, where D[f = Sa+d.b- (23) 

a=l 

This matrix is non-symmetric, and it will be very interesting to develop a method to deal with it (see [69^ for a solution, 
based on the circular symmetry of the problem and the Gaussian approximation, in the simplest case of C = Iat and 
A — It', here Ik denotes the unit K x K matrix). In the present paper, however, we will not attempt this more 
challenging task, leaving it for future work, but only resort to the simplification [70| of considering a symmetrized 
version of (^5]) . 

c^y-W ^ ^RD.y.n.(d)j^T^ ^l^g^.^ i^p-W ^ 1 (<5^^,^, + , (24) 

which becomes symmetric, and therefore tractable within our present approach, but still carries some information 
about delayed correlations between assets. In terms of the uncorrelated Gaussian variables (PT|) it reads 



^synUd) ^ _ycR\/AD^y™-('^)\/ART\/c. 



(25) 
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This is also a doubly correlated Wishart random matrix, akin to c, albeit with a modified underlying auto-covariance 
matrix, A -> ^AD'^y'^-^'^^VA. 



3. Estimators with Weighting Schemes 



The standard Pearson estimator Cij of the cross-covariance between assets i and j (|19l) is defined as the average 
of the realized cross-covariances RiaRja over the past time moments a. In this way, all these past values of the 
realized cross-covariance have an equal impact on the estimator of the today's cross-covariance. However, when 
discussing an analogous problem for the estimates of the variance in par. II C 21 we discovered that the phenomenon of 
heteroscedasticity, modeled by some quadratic ARCH stochastic process (|15l) . implies the presence in financial time 
series of a long memory, described by the weights Wa (of a power-law or logarithmic decay, but frequently used as 
well is an exponential decay, i.e., the EWMA). In other words, the older the realized variance the more obsolete 
it is, i.e., the more suppressed its contribution to the estimator of the today's variance is, as given by the weight Wa. 
(Here we will adopt a convention that in our time series, enumerated by a = 1, . . . , T, the most recent observation is 
a = 1, and moving backward in time as a increases.) Now, it is a common practice to set up the updating schemes for 
the cross-covariances by simply mimicking the schemes for the variances; see [60j . compare also the discussion in [66j . 
Therefore, we will consider the following general class of "weighted estimators" of the cross-covariances, 

c^-'''''' =Y,^aR^aRJa. ^.e., c^'sht ^ _ R^R^ , where W = Tdiag (u;i, . . . , «;t) • (26) 

a=l 



Again, it is convenient to convert the correlated Gaussian random variables R into the uncorrelated ones R, which 
leads to an expression analogous to p5|). 

^.weight ^ iycR\/AW\/ARTyc, (27) 

which is the doubly correlated Wishart ensemble with the underlying covariance matrices C and a/AW-v/A. In this 
way, also the weighted estimators have been grasped by our general framework. 



II. FREE RANDOM VARIABLES: A PROMISING APPROACH TO THE ESTIMATION OF 

COVARIANCE MATRICES 



A. The Free Random Variables Calculus in a Nut— Shell 



1. The Basic Notions of Random Matrix Theory 



When studying (see for example [7l|, uM) ^ '^^^l symmetric (or complex Hermitian) K x K random matrix H, 
drawn from some probability distribution P(H), perhaps a most natural question is about the probability distribution 
of its (real) eigenvalues Ai, . . . , Xk, which is quantified by the "mean spectral density," 



pu{X) ^ I E (-5 (A - AO) = ^ (Tr (Al^ - H)) , (28) 

i=l 

where (5(A) is the real Dirac delta function, the expectation map (. . .) is performed w.r.t. P(H), and we recall that 
Ik denotes the unit K x K matrix. 

This statistical information about the spectrum is equivalently encoded in the "Green's function" (also called 
"resolvent," "Cauchy transform" or "Stieltjes transform"), which is a complex function of a complex variable z. 
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For finite if, this is a meromorphic function, witli tlie poles at the A^'s on the real axis. On the other hand, in 
the usually considered limit of an infinitely large random matrix {K — >■ oo), the mean eigenvalues tend to merge 
into continuous intervals ("cuts"; they can be infinite or finite, connected or not), and the Green's function becomes 
holomorphic everywhere on the complex plane except the cuts on the real line. As such, it can typically be expanded 
into a power series around z — s- oo, 

Gh(2) = E ^^H,„ ^ ^ (TrH") - / dApH(A)A", (30) 

n>0 ^ •^'^"'^ 

where the coefficients are called the "moments" of H. In particular, in the strict limit z — > oo, it must obey 

Gh(2:) -, for z oo. (31) 

The above expansion (pO)) suggests working with an alternative object to the Green's function, namely the "generating 
function of the moments" (or the "Af-transform" ) , simply related to the former, 

Mh(z)^zGh(z)-1 = E^- (32) 

n>l 

We will be using both, depending on convenience, but chiefly However, we stress that even if the moments do 

not exist, and thus the expansions (l30t . (|32p are not valid, the knowledge of the analytical structure of the Green's 
function (|^^ is sufficient to extract the statistical spectral properties of the random matrix. 

Namely, once the Green's function has been derived, the corresponding mean spectral density is found by using 
the Sokhotsky's formula, limj_j.o+ 1/(A + ie) = pv(l/A) — i7r(5(A), which yields 

Ph(A) = lim ImGH(A + ie). (33) 

TT e-i.O+ 

In other words, the density is inferred from the behavior of the Green's function in the imaginary vicinity of the 
eigenvalues' cuts on the real axis. 

Finally, let us introduce the functional inverses of the Green's function and the moments' generating function, 

Gh(Bh(2))-Sh(Gh(z)) = 2, M^{NM{z))^N-a{MYi{z))^z. (34) 

The former has somewhat fancifully been named 73] the "Blue's function" (known also under other names in liter- 
ature), while the latter will more conservatively be called the "A'^-transform." These two functions are fundamental 
objects within the FRV approach, see below. Additionally, the Blue's function can be expanded into a power series 
around z = Q: it must start from a singular term 1/ z due to (j3ip plus a regular expansion, 

Bh(2) = - + VifH,„+l2", (35) 
z 

n>0 

where the coefficients, for the reason explained below, are referred to as "free cumulants." (Let us mention that 
there is another commonly exploited object equivalent to the Blue's function, which subtracts the singular term from 
the above expansion, and is named the "i?-transform," Ru.{z) = B-ii{z) ~ 1/ z. We will however adhere to using the 
Blue's function.) 



2. The Basic Notions of the Free Random Variables Calculus 



Let us now detail the key features of the so-called "free random variables" (FRV) calculus, presented parallel to 
the corresponding notions in the standard probability calculus. 

An important problem in classical probability [74] is to find the probability density function (PDF) of 
the sum of two random variables, xi+X2, provided they are independent, and we are given their separate 
PDFs, and Px2- This is readily solved by applying the Newton's formula to the moments of the sum. 
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Mx^+X2,n = {{xi + 2^2)") = X]fc=o (feO-^a^i.fc^sjs.n-fc- The momcnts are conveniently encoded in terms of the "charac- 
teristic function," which is a Fourier transform of the PDF, 

5.(z)^^^z" = (e^^). (36) 

n>Q 

(Here z must be a purely imaginary number on account of convergence of the sum over n, but we will not explicitly 
print this for the sake of future reference.) The above addition rule for the moments can be re-stated as that the 
characteristic function is multiplicative under the addition of independent random variables. In other words, its 
logarithm, 

r^{z) = log g^{z), (37) 

is additive, 

rxi+x2{z) ^ r^j^iz) + r^^iz), for independent a; 1, X2. (38) 

This may be named the "classical addition law" ; it shows that the addition problem for classical independent random 
variables is solved by (i) forming the Fourier transforms of the PDFs Px^ and P^^, i.e., the characteristic functions, (ii) 
taking their logarithms, (iii) using the fact that the logarithms of the characteristic functions are additive, (iv) removing 
the logarithm, which yields the characteristic function, and so also the moments and the PDF, of the sum xi + X2. 
(The logarithm of the characteristic function can be expanded in a power series around z = 0, r^^z) = X]n>i ^x,nZ^ ■ 
Its coefficients are called the "cumulants," and are obviously additive, kxi+x2,n = kxi,n + kx2,n, upon the addition of 
two independent random variables.) 

It is very far from trivial how to extend these steps into the case of random matrices, i.e., from the commutative 
to non-commutative level, and it is the FRV theory of Voiculescu et al. and Speicher [2^ [l^l that develops a precise 
answer to this question. First of all, FRV puts forth a powerful concept of "freeness," which is a non-commutative 
analog of independence. We will not delve too deep into explaining its construction, but let us show how it differs 
from classical independence. Namely, in classical probability, xi and X2 are independent if their demeaned versions, 
Xi^2 = 2^1,2 ~ (2^1,2), obey {X1X2) ~ 0. In non-commutative probability, the m non-commutative random variables 
(random matrices) Xi, . . . ,Xm are called "free" if their demeaned versions Xj = Xj — (x^) satisfy 

(pi(X,J...p„(X,J) =0, (39) 

for all positive integers n, all polynomials pi, . . . ,pn, and all indices ji, . . . , j„ = 1, . . . , m such that ji 7^ j2 7^ • • • 7^ jn- 
For example, if xi and X2 are free, there will be (x^Xj) = (xf)(x|), i.e., just like for independent classical variables, 
but also (X1X2X1X2) = (x^)(x2)^ + (xi)^(x2) — (xi)^(x2)^, much differently than in the commutative situation. In 
other words, the mixed moments of free non-commutative random variables generally do not factorize into separate 
moments, as it is the case for independence. Freeness is therefore a much more involved property. To give a practical 
summary, let us state that random matrices drawn from factorized distributions exhibit (asymptotically, i.e., when 
their sizes tend to infinity) freeness. (Borrowing a picture from physics, we may say that freeness is equivalent to 
planarity in the limit of a large number of colors in field theory [75. 76].) 

Freeness is a relevant idea because the problem of adding two free non-commutative random variables, Xi + X2, 
can be solved in a way analogous to its classical counterpart. Without any proofs (which are not very complicated 
but lengthy), we will just describe the resulting procedure: 

Step 1: The moments of the free random matrices, xi and X2, are conveniently encoded in the Green's functions 
G^,{z) and G'x,(z) ([Ml), dSOl). 

Step 2: The Green's functions are inverted functionally to obtain the respective Blue's functions B^^{z) and By^.^{z) 

(El- 
Step 3: The Blue's functions obey the "non-commutative addition law," 

Bxi+X2(2) = Bxi(2) + Sx,(2:) - ^, for free xi, X2. (40) 

(Equivalently, this means that the i?-transforms are additive, i?xi+x2(-2^) — Rxiiz) + Rx2{z). Trivially, the free 
cumulants psp are additive as well, K^-^^x2.n = ^xi,n + -f'^x2,n- Let us also mention, for the readers familiar 
with the Feynman diagrammatic techniques, that the additivity of the i?-transform can be explained in terms 
of the additivity of the self-energy.) 
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Step 4: Invert functionally Bxi+X2{z) to find the Green's function of the sum, G'xi+X2(^)i S'Hd subsequently, its mean 
spectral density Pxi+X2(-^) through the Sokhotsky formula ((33| . 



We recognize that it parallels the classical construction: the Green's function is an analog of the characteristic 
function (j36p , functional inversion and forming the i?-transform replaced taking the logarithm p7p , and the addition 
law is a direct generalization of the classical one The correspondence between the classical probability calculus 

and matrix probability calculus (FRV) is thus summarized in the following chart: 



PDF o 
i 

characteristic function o 

logarithm of characteristic function -H- 

i 

additivity for independent variables -H- 



spectral density 
. ^ 

Green's function 

; 

i?-transform 

; 

additivity for free variables 



(41) 



A closely related problem is how to deduce a composition law for the multiplication of free random matrices. The 
distribution of a product of independent random variables is not widely discussed in textbooks on classical probability 
theory, since it can be derived from the relation exp xi exp X2 — exp(a;i + X2), which reduces the multiplication problem 
to the addition one by a change of variables. However, this is not the case for random matrices, which do not 
commute: in general, expxiexpx2 7^ exp(xi +X2). This notwithstanding, there exists (26j a transformation (called 
the "^-transformation" ) which allows one to calculate the resolvent of a product of free random matrices X1X2 from 
the resolvents of each separate term, just like there is the i?-transformation for the sum. Again without proofs, the 
multiplication algorithm is: 



Step 1: The moments of the free random matrices, Xi and X2, are conveniently encoded in the moments' generating 
functions Mxi(^) and M:^^{z) ((5^ . 

Step 2: The moments' generating functions are inverted functionally to obtain the respective TV-transforms iVxi(z) 
and Nx,{z) dMl)- 

Step 3: The A'^-transforms obey the "non-commutative multiplication law," 

Nx,M = T^N^dz)Nx2iz), for free xi, X2. (42) 
1 + z 

(Equivalently, this means that the so-called "S'-transforms," Sx{z) ^ (1 + z) /{zNjf^{z)), are multiplicative, 

5'xiX2(2) = 5x1 (2)5x2 (2:)-) 

Step 4: Invert functionally Nx-^x2{z) to find the moments' generating function of the product, Mx-^x^{z), and subse- 
quently, its Green's function and mean spectral density. 



Let us close with a few comments: 



• There is a one-to-one correspondence between classical and free random variables, which in particular allows 
one to map probability densities of random variables into the corresponding eigenvalues' densities of large free 
random matrices [77| . 

• Also, one can define the analog of the concept of stability [zll, which in the FRV calculus assumes the form of 
spectral stability. 

• A consequence of the above two observations is that the eigenvalues' distribution of a properly normalized sum of 
many random matrices for which the second spectral moment is finite tends to a universal limiting distribution 
known in RMT as Wigner's semicircle law 79]. The Wigner's distribution in the FRV calculus corresponds to 
the Gaussian distribution in the standard probability calculus. 

• Another consequence is that there exists a counterpart of the Levy stable distributions for FRV. Since large 
random matrices asymptotically represent free random variables, one can expect the existence of large free 
random matrices in the Levy stability class. We will exploit this fact in a forthcoming publication. 
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• Recently, it has been proven that FRV exhibits central theorems for extreme values [8(1] , again in a one-to-one 
correspondence with the extreme values' distributions known in classical probability from the Fisher-Tippet 
theorem, i.e., the Frechet, WeibuU and Gumbel distributions. 

• For completeness, let us also mention that FRV can also generate dynamical stochastic processes [8TI-|83|. alike 
Gaussian distributions generate random walks in classical probability. We will not discuss them in this work, 
restricting our attention to stationary properties of FRV only. 



B. The Uncorrelated Wishart Ensemble From FRV 



1. The Estimator c for C = Ijv and A = It 



As a first display of the efficiency of the FRV method, we re-derive the Green's function (equivalently, the 
moments' generating function; consequently, the density) of the so-called uncorrelated Wishart ensemble [46], that 
is, the random matrix c in which C = 1^ and A = It has been set, 

c = -RR'^, a = — R'^R, (43) 



with the uncorrelated Gaussian probability distribution Pg.(R-) (EH)- These are the Pearson estimators of the cross- 
covariance and auto-covariance matrices, respectively, with the trivial underlying covariance structure Ciajb — ^ij^ab- 
We hope that this short, simple, and entirely algebraic calculation of the result which is relatively well-known in the 
quantitative finance community (the Marcenko-Pastur distribution [45j). but found previously only with aid of more 
involved tools (such as the planar diagrammatic expansion or the replica trick), will convince the reader about the 
obvious advantages of the FRV calculus. 

We will show that the iV-transform of c, for any value of r > 0, reads 

(1 + z)(l + rz) , ^ 

N,{z) - ^ ^ (44) 

which after functional inversion (solving a quadratic equation; the proper one of the two solutions is chosen so to 
satisfy (j3ip . which implies the minus sign before the principal square root) yields the moments' generating function 
(which we will not print), and upon using p2p . also the Green's function, 

G.(z) = i±^-i-^^iHMIiEZ3, where A,.(l±VF)^ (45) 
2rz 

The Sokhotsky formula ((33)) then finally leads to the celebrated Marcenko-Pastur spectral density. 



Pc(A) = for Ae[A_,A+]. (46) 



2. The Duality 



Before we proceed to the derivation, it is important to express in a quantitative way the relation (a "duality") 
between c and a announced already in par. II D II this argumentation is valid for arbitrary C and A. Indeed, the 
moments satisfy „ = ?'"~^Ma,n, for any n> 1 and regardless of the value of r > 0, due to the cyclic property of 
the trace (and M^fi = -Ma,o = !)■ In terms of their generating functions, and consequently the Green's functions, this 
relation reads 

1 — r 

Mjyz) rMc{rz), or equivalently Ga{z) = r^G'c(rz) H . (47) 

z 

These formulae can obviously be inverted to yield c in terms of a: it amounts to simultaneously exchanging c ^ a, 
C ^ A and r ^ 1/r. Also, they remain intact even when the measure is not Gaussian, and even when the moments 
do not exist; in this case, a proof features a simple algebraic manipulation using the definition of the Green's function 
()29|) . As mentioned before, (|47p means that the information carried by c and a is equivalent, and we may safely forget 
about one of them, say a. Also, we will use (H71) in the following. 
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3. An FRV Derivation of gj) 



We will now present a purely algebraic computation (8J, |85| of the A''-transfornis of both the uncorrelated Wishart 
matrices c and a based on the multiplication property of the TV-transform for free random matrices (|42p . 

It is convenient to start from assuming N < T {i.e., r < 1) and considering the random T xT matrix (l/r)R''"R. 
The following trick is exploited in order to work with square matrices instead of rectangular: We introduce a square T x 
T random matrix X with real uncorrelated Gaussian entries, Pq,(X.) exp(— (1/2) J2ab-^ab) ~ exp(— (l/2)TrX"'"X). 
Next, we use the projector 

P = diag(lAr,OT-w), (48) 

to cut from X an iV x T rectangle, Rq = PX. More precisely, this is _a square T x T matrix whose all the entries are 
zero but the "upper" N xT rectangle. This rectangle may be called R, since all its entries are uncorrelated Gaussian 
random variables. Hence, 

IrTr = IrJRo = ix^PX. (49) 

Furthermore, thanks to the cyclic property of the trace, all the moments of this matrix are equal to the moments 
of (1/T)PXX"'", so also their A^-transforms coincide. Now, this is a product of two free matrices, P and (1/T)XX"'', 
therefore, the multiplication law (|42p allows to write 



Ni 



'ixTPx(^) = ^iPxxT(2) - YT^^p(2)A^ixxT(^). (50) 



The iV-transform of the projector is easily computed, 



r 



Np{z)^l + -, (51) 

z 

because all its moments Mp,„ = (l/T)TrP" = (l/r)TrP ^r, n > 1, hence Mp(z) = r/{z - 1), whose functional 
inversion is the above. 

It remains therefore to find the iV-transform of (l/r)XX'^. We recognize that this is an uncorrelated Wishart 
random matrix with r = 1; in other words, the projector trick and the FRV multiplication law reduced the problem 
with an arbitrary r to solving the r = 1 case. Now, this simplified problem is handled by noticing that the spectral 
properties of the r — 1 Wishart ensemble are equivalent to that of the squared Gaussian Orthogonal Ensemble 
(GOE). The argumentation is more clear in the case of complex entries in X; and at the leading order in the large-T 
limit there is no difference between the real and complex versions. Namely, X can be decomposed as a sum of its 
Hcrmitian and anti-Hermitian parts, X = Hi + iH2, which implies TrXX^ = Tr(H^ + H|). This means that the 
Gaussian measure for X factorizes, i.e., Hi and H2 are two independent Hcrmitian random matrices (GUEs). More 
generally, Tr(XX^')" = Tr(Hi + H|)", for any integer n> 1, hence the random matrix XX^^ is equivalent to a sum of 
two squared GUEs. Returning to real matrices, and taking into account the corresponding rescaling of the variance, 
we arrive at the conclusion that 

N^xx-iz) = Ngoe^z)- (52) 

The spectral properties of the square of a matrix are related to those of the matrix by a simple algebraic manipu- 
lation, l/{z'^lx — H^) = {1/{z1t — H) + 1/(z1t + H))/2z, which implies, in the relevant situation when all the odd 
moments vanish, 

Mh2 (z2) = Mh(z). (53) 
The moments' generating function of the GOE is well-known and given by the Wigner's formula [79| , 

Mgoe{z) - ^ - Vz^ - 4) - 1. (54) 
These ingredients ((5^ . ([55]) . ^f5^ assembled together lead to the iV-transform of the r — 1 uncorrelated Wishart, 

(1 + z)^ 

A^ixxT(^) = ^^^- (55) 
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Plugging dnH) and into dSH]), and using (gH]), finally yields 

which we recall has been derived for r < 1. The scaling relation Ng-H{z) ~ gNu^z), true for any random matrix H 
and non-zero complex constant g, implies further that 

(l + z)(r + z) 

^iRTR(^) - • (57) 

Moreover, the cyclic property of the trace applied in these formulae provides us with 

- ii±iM, ,58, 

and 

Although we originally assumed that r < 1, we observe that ([5S|) and ([5^ transform into each other as we exchange 
r ^ 1/r and R R"^, as do (|57| and ((58l) : this is precisely the duality (|47|) . It means that all these results hold true 
for any r > 0. This completes the derivation, since ([58| is the desired A^-transform of c (1441) . 



C. The Doubly Correlated Wishart Ensemble From FRV 



1. The Estimator c for Arbitrary C and A (the Main Result) 



In this subsection, which constitutes the central piece of our work, we will consider the dou- 
bly correlated Wishart random matrix c = (1/T)a/CRAR"^\/C (I22p . as well as its time-lagged version 
(.sym.(d) ^ (l/T)VCRVAD'^y'°•('''^/ART^/C (gS]), and show how a back-of-an-envelope calculation, founded upon 
the FRV multiplication law (|42p . leads to expressions for the A^-transforms of these estimators; through functional 
inversions, these expressions will yield equations for the moments' generating functions of c and c^y'^-i'^) ^ which in 
turn carry the full information about the spectral properties of these estimators. 

The A^ -transform of the estimator c in the case of arbitrary underlying covariance matrices will be found in 
par. HTCSl to be 

N,{z)^rzNA{rz)Nciz). (60) 
In other words, this is an equation for the moments' generating function M = Mc{z), 

z = rMNA{rM)Nc{M). (61) 



A few comments are in place: 

• When C is arbitrary, but there are no auto-covariances, A = It, we have Na{z) = 1 + l/z, hence equation 
([6T|) becomes 

M-^^c(^). (62) 

• A similar simplification occurs when A is arbitrary, but there are no cross-covariances, C = Ijv, in which case 
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2. The Estimator c'^™-*'^) /or Arbitrary C and A 



A one-line computation presented in par. Ill C 41 leads from ([60)) to a formula for the A^-transform of the time- 
lagged estimator c^y™-id-\ since the latter is a version of the former with a modified auto-covariance matrix A, 

2 2 

A^c^y^-C^) (z) = -^-^A^D^y^ c^) irz)NA{rz)Nc{z). (64) 
I + rz 

Equivalently, this is an equation obeyed by the moments' generating function M = Afj.sym.(d) (z), 

No,y.,.w{rM)NA{rM)Nc{M), (65) 



1 + rM 

In par. Ill CTl we derive an explicit expression for the Green's function of the symmetrized delay matrix D^y'^-('^) 
([M)) . which allows to find its Af-transform. Recalling that t = T/d is an integer > 2, there is 

1 * 1 f ^ yy^i -o V^Fj- for t even, 

t^z-cosj^ 7- + ^ >J7 1 , ,1 for t odd. 

Notice that this result ()66p does not depend on T or ci separately, but only on their ratio t. In particular, it remains 
true in the limit 

T — > oo, d — oo, such that t = fixed, (67) 

in which we have a very long time series divided into a fixed number of very long lags. Such a situation may be 
financially relevant: Indeed, a natural choice for the lag d would be the scale r of the true temporal correlations existing 
in the system (in the units of 8t). And for example, if one assumes that the proper description of heteroscedasticity 
is through the I-GARCII(l) process with the parameter a (see par. II C 2|) . there appears a characteristic time t = 
— 1/loga. A financially justified limit (|99l) . which we discuss later, can then be taken in which t ^ T; hence, ([57)1 
seems to be able to probe a relevant regime. 

Another interesting limit would be of a very long time series with a finite time lag, 

T — )■ oo, t oo, such that d = fixed, (68) 
in which case the sum in (|66p can be approximated by an integral, 



1 1 J- ^ 

G'r)Eiym.(d=tixod) (z) = / dx ; — - = —= , hence, A'T-,sym.(d=fixod) (z) = — :. (69) 

Jo z - cos(7rx) - 1 / /r, , N 



1 + z 
V^(2 + z)' 



(This can be checked to be equivalent to the infinite symmetrized delay matrix with d ~ 1, i.e., with the "nearest- 
neighbor" delay. A finite d compared to an infinite T is just like d = 1. We remark that (|69|) may as well be obtained 
through the method sketched in par. IIIIAll ) Equation §B) acquires the form 



, rM 

z^rM^j-^Nj^irM)Nc{M). (70) 

Formally, it is equivalent to the corresponding one for the usual estimator c ()6ip with the substitution 
z — > Zy^l + 2/ (rM). Let us however print some of its special cases: 

• When there are no underlying covariances, C = Ijv and A = It, (|70p becomes a fourth-order polynomial 
(Ferrari) equation for M, 



z 



r'^M^ + 2r(l + r)M'^ + (l + 4r + - z^) Af^ + 2 ( 1 + r - ^ ) M + 1 = 0. (71) 
It coincides with the result presented without proof in [t^I • 
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For C arbitrary and A = 1^, 



M = Mci \/l + — ]. (72) 



For A is arbitrary and C = 1 



N, 



rM = Mx{^^ T\l + — \. (73) 

I r(l + M) \ rM j ^ ' 



3. An FRV Derivation of m)]) 

The idea behind the following proof is to reduce the problem in the case of arbitrary underlying covariance matrices 
C and A to the uncorrelated version (solved in par. IIIB 3} by successive use of the cyclic property of the trace and 
the FRV multiplication formula for the A^-transforms (|42| . Indeed, the cyclic property allows to write 

^c(^) = iV,^ARTcW = ---- (74) 
Being a product of two free random matrices, the multiplication law gives further 



z 



1 + z T 

Again, the cyclic property applied to the first of these matrices implies 



N^RAR-iz)Nciz)^.... (75) 



z 



^iRTRAM^c(2) = ..., (76) 



1 + 2 T 

where the argument rz appeared because the cyclic shift changed an N x N matrix into aT xT one, which accordingly 
rescaled the moments. The first A^-transform here is of a product of two free random matrices, hence further 

Z TZ 

N^-^T^{rz)NA{rz)Nc{z) = . . . . (77) 



1 + z 1 + rz T 

In this way, exploiting twice the cyclic property of the trace and twice the FRV multiplication law, the problem has 
been boiled down to the uncorrelated case, solved in ([55|. which finally produces the announced result (|60p . 

...^rzNAirz)Nc{z), (78) 

equivalent to equation ([6T|) for Mdz). 
Let us make a few comments: 

• The method is remarkably simpler than other known approaches (planar Feynman diagrams, the replica trick). 
Equation ([6T|) has been found through diagrammatics in [l^, HH]; and even earlier, in the case of A = It, 

in mi,!!!]. 

• It does not rely on the existence of the moments. 

• It is not specified to Gaussian randomness. In particular, it may be extended to the more general instance of 
the Levy randomness. 

• It can be generalized to longer strings of free random matrices. 

• It is valid (as is the entire FRV calculus) only in the thermodynamical limit oi N, T large with r — N/T 
fixed. For finite values of N, T, there will in general be finite-size corrections 0{1/NP), where p depends on the 
type of randomness. 
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4- An FRV Derivation of ^64^ and 



As stated in par. IIP 21 the estimator of the symmetrized time-delayed cross-covariance matrix c'^y'" (^) ^ has 
the same form as the usual estimator c ((22)) . only with a modified true auto-covariance matrix, A — ■\/AD''y™ ('''\/A. 
Therefore, the formula (|64l) for the A^-transform of c^^™-^''^ is proven by using the result (pO)) for c with this modification 
included. Now, the A'^-transform for the modified underlying auto-covariance matrix is obtained through the cyclic 
property of the trace and the FRV multiplication law, 

which readily justifies (|64l) . 

The obstacle we are facing at this point is to evaluate the A'-transform of the symmetrized delay matrix 
-^afc™ ~ (l/2)(^a+d,& + i5a-d,b) (l24| . This is a symmetric T xT matrix, and we recall that the lag d is an in- 
teger such that t = T/d is an integer > 2; as for now, these numbers are finite. For this purpose, the delay matrix 
should be diagonalized. 

First, we remark that D''^™ *^'^^ can be regarded as a t x t block matrix, with blocks of size dx d, each proportional 
to the unit matrix Id, and the block matrix having the structure of the so-called "nearest-neighbor delay matrix," 
which is T)^y^-'^d=i) gi^e t x t, D^""-^^^ = (l/2)((5^+i.t, + j,), a,6= Concisely, 

J3sym.(d) ^ ]-)n.n.(i) ^ (gQ) 

We infer from ((511)) that the eigenvalues of D'^''™-^'') are just the eigenvalues of D" "-(*\ denote them by A^ " *-*'', each 
one taken d times. Consequently, 

1 ^ d 1 ^ 1 

GD=ym.(d) (^) = 51 7^::^ " GDn-n-(*) (^)- (8l) 

i.e., the two Green's functions are equal. The task is thus reduced to diagonalizing the t xt nearest-neighbor delay 
matrix. 

This can be done analytically. For simplicity, consider the nearest-neighbor delay matrix without the prefactor 1/2, 
2D'^"'^*-'. Its characteristic determinant, I?'*^(7) = Det(2D" " '-*-' — 7!^), is straightforwardly computed inductively 
w.r.t. t by expanding w.r.t. the first row, I?'^*)(7) — — 7l?'^*~^)(7) — I'(*~^^(7), for t > 2, where we assume V^^^'j) = 1. 
This recurrence relation (which generates the Fibonacci series) can be solved for example by the generating function 
technique [s^, and gives 

T^^'Hl) = ( 4^ - 4^] , (82) 



S2 - Si 



where si, S2 are the two roots of the quadratic equation 1 + 7s + = 0, and we must constrain s\ 7^ si (i.e., I7I 7^ 2), 
since it can be verified that otherwise there are no solutions to the characteristic equation. The characteristic equation 
(7) = is therefore equivalent to 

A^' = 4+^ (83) 

If Si, S2 were real (i.e., I7I > 2), this would imply si = S2, which is impossible. Hence, there must be I7I < 2, and Si, 
S2 complex and mutually conjugate. 



si.2 = -^±i^^-V^ ^e*'-^. where tan (/) = - ^^-^^ , </> G [-tt, tt). (84) 
2 2 7 

Then ([55)) means that si/s2 is a (t + l)-th root of unity; there are (t + 1) such roots, but we must exclude the one 
equal to 1, so there remain t roots, and the characteristic equation becomes 

— = exp , tor a = \,...,t. (85) 

S2 i + 1 

Comparing ()84)) and (|85)) finally provides the eigenvalues of the txt nearest-neighbor delay matrix, 

A-"(*) =cos^, for a = \,...,t. (86) 
The formula for the Green's function (|66p is then immediately recovered. 
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III. EXAMPLES 



A. An Exponentially Decaying Auto— Covariance 



1. Introduction: Infinite Translationally-Invariant Matrices 



In this subsection, we will choose a particular model for the underlying auto-covariance matrix A. It will have 
one generic feature, "translational invariance," which means that the value of a matrix element depends only on the 
distance between its indices, and not on their separate values, 

Aab = A[a ~ b). (87) 

Such a dependence is natural for a matrix describing temporal correlations between measurements. Moreover, we will 
consider A to be infinite, such that its indices range over both positive and negative values, a, 5 G Z. 

There exists a convenient framework for dealing with infinite matrices (not necessarily fulfilling (|87|) ): it is to 
perform the Fourier transformation of the matrix' indices a, 5, replacing them in this way with continuous variables 

P,g e [-71-, tt), 

A{p,q) = J2 e'^^^-'^^^ah, or conversely, Aat = ^ f f dpdqe''^''P-'">^ A{p,q). 

For instance, the Kronecker delta Sat is mapped to the Dirac delta 2TrS{p— q), and matrix multiplication is translated 
to the integration ^ J^^ dp{. . .). In particular, the Fourier transform (j88|) of a translationally-invariant (1871) matrix 
is proportional to the Dirac delta and reads 

A{p,q) ^2TrS{p- q)A{p), where A{p) = e''^'' A{d) or conversely, ^(rf) = — / dpe^''^PA{p). 

(89) 

Knowing the Fourier transform A{p) allows to evaluate the moments' generating function of the matrix A. Indeed, 
consider the matrix Ga = ^/{zIt — A). In other words, Ga{z1t — A) = l^. After the Fourier transformation, this 
equation can be solved as follows, Ga{p, q) = 2Tr6{p — q)/{z — A{q)). Transforming back, 

[GaU r dpc-^P^^-"^ \— EE [GA](a- 6), (90) 

z - A{p) 

and taking trace yields the Green's function of A, 

Ga(2) = ;^TrGA = [Ga](0) T dp 1—, (91) 

J 27r z - A{p) 

where we made use of the property (l/T)TrB — B(Q), true for a translationally-invariant T x T {T ~> oo) matrix B. 
Finally, 



1 r 

Ma{z) ^— dp 



Mp) , X 

(92) 



z - A{p) 



2. An Exponentially Decaying A and an Arbitrary C 

Let us now assume a particular translationally-invariant model of temporal covariances, namely, an exponential 
decay, 

Aab = A{a-h) = e-\''-''\'\ (93) 
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FIG. 1: LEFT: The theoretical eigenvalue density of the empirical cross-covariance matrix c for TV — > oo identical normally 
distributed degrees of freedom, mutually uncorrelated but exponentially correlated in time (|93p . for r = 0.2 and t — 0,1, 2, 4, 8. 
RIGHT: A comparison of the theoretically predicted eigenvalue density with a Monte-Carlo-generated spectrum, for A'' = 100, 
r = 0.2, T = 2, obtained by diagonalizing 4000 matrices. Finite-size effects appear only at the edges of the spectrum. 



where r is a correlation time (in the units of the elementary time step St), and it will be convenient to denote 
7 = coth(l/r). It is a natural model, aiming for example at sketching the temporal behavior described in par. II C Tl 

We start from calculating the Fourier transform ([55]) of A, 

1 _ 

Mp) - , , (94) 

1 — 2e '-'^ cosp + e 

which leads (|92p to its moments' generating function and A^-transform, 



Ma{z) = ^ ^ hence, A^a(^) = 7 + V 7^ - 1 + 4- (95) 

Let the true cross-covariance matrix C be completely arbitrary. The pertinent equation for the moments' gen- 
erating function of the estimator c, M = Mc{z), is (l5Tt . and for the exponentially decaying A ([M)) it assumes the 
form 

M = Mc I , ^ I . (96) 

+ (72-1) M2 + 1 j 

For example, if C = Ijv, (|96p becomes a fourth-order polynomial (Ferrari) equation, 

r^M^ + 2r(r - -fz)M^ + (z^ - 2r7Z + - l) Af^ - 2A/ -1=0. (97) 

This result has been derived by diagrammatic methods in [23j . It can be appropriated numerically to yield M ~ Mc{z), 
translated next to the Green's function (15^ . and finally to the mean spectral density (1551) . plotted in fig.[Il 

As mentioned before, if we want to consider instead the symmetrized time-lagg ed estimator c^y'^-i''-) ^ in the limit 
(1681) . the resulting equation will differ from (I96p only by formally replacing z — >■ z-\/l + 2/{rM), so we will not print 
it explicitly. Then, for C = Ijv, an eight -order polynomial equation is found. 



B. The Exponentially Weighted Moving Average 



As extensively explained in par. II C 2l and lID 31 an implication of assuming that the heteroscedasticity is modeled 
by the I-GARCH(l) process, is that one should use a weighted estimator for the cross-covariance matrix (|27|) with 
the EWMA scheme 

Wab = T-^-^a^-'Sab. (98) 
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Here a G [0,1] is a constant, typically close to 1 (for example, a — 0.94 in RiskMetrics 1994 [63, |6J|), and we will 
actually consider the following double-scaling limit, 

N 

N,T ^oo, a^l-, such that " ^ = ^^^^^ P = T{1 - a) = fixed. (99) 

In this limit, the range of the EWMA suppression r = — l/loga ^ 1/(1 — a) ^ T. (Note that the definition of f3 
differs from the one used in [35], and is more natural for relating the time cutoff t to the length of the time series T.) 



The moments' generating function of W can be explicitly calculated in the limit ((99)) . 

a=l a=i T(l-a)a"-i ri>l ' o'=0 



^ z»(l-a^)"(l-a-"^) 



- 1 = 



which for TV — oo (capturing the double-scaling limit after everything has been expressed through N , r, /3) simplifies 
to 

^ z" (1 - e-'3)" (1 - e^") 1 l-i(c^-l)z 

^ A — --^^-^+^^"g i4(i-cJ). - 



n>l 



Inverting it functionally yields the A'^-transform, 

,3(1 + 2) _ 1 

^-(-)- ^e^-l)(c.--l) - (101) 

Let us assume that the underlying cross-covariance matrix C is arbitrary, and there are no auto-covariances, 
A = It. Then, the weighted estimator c^^^^ ([57]) is the doubly correlated Wishart random matrix with the 
covariance matrices C and W, respectively. Having derived the A'^-transform of the latter (|10ip . we may write 
equation (|6ip for the moments' generating function M = M^ewma[z), 

In particular, for C = l^r, this acquires the form 

_ r/3(l + M) {eP(^+-^i) - l) 

^= (e/3 _ 1) (e^rM _ 1) • 



(103) 



It is an entangled equation. In fig. [2l we present its numerical solution for r = 0.2 and different values of (3: Since 
a general effect of exponential weighting is to effectively reduce the length of the sample (short memory), hence, 
increasing /3 amounts to increasing the noise-to-signal ratio r, which results in broadening of the spectrum. 

In order to compare (|103p with the results known from literature, we rewrite it as an expression for the Blue's 
function (l34l) of c, 



^cW = Mi-;^i°g i-T^ll' (104) 



which is slightly more general than the one recently obtained in [35|,|61|, where the authors first took the limit r — > 0, 
before taking the double-scaling limit (|99p . and here we have an arbitrary r. Having a finite r allows us to consistently 
consider the limit /3 — >■ 0, which reproduces the Marcenko-Pastur spectrum, 

BJz) -+ i + — - — , as P^O. (105) 

z 1 — rz 
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FIG. 2: LEFT: The spectral density for r = 0.2 and /3 = 0, 1, 3, 10. 

RIGHT: Comparison of the theoretical prediction for the mean spectral density from H103P with numerical calculations, for 
r = 0.2, /3 = 5, with iV = 100, T = 500, a = 0.99, over 1000 samples. 



The limit r — > also exists, defining the pole of the Green's function at 1, independently of /?. To recover the findings 
of [35, 61], we define g = r/3 and take r — > with q fixed, 

1 / 1 e^«/'" \ 
Bc{z) ^ - 1 - - log(l - qz) (1 + qz)] , as r ^ 0, <? = fixed, (106) 

z \ q q J 

with the first two terms reproducing [35ll6]|. and the third term converging exponentially fast toward zero. Equation 
P04p is also useful to obtain the support of the underlying spectrum; following [13, , the endpoints are defined as 

a;*=Bc(zO, where (z,) = 0. (107) 

The problem, for generic r and /3, may only be solved numerically; the dependence of the upper and lower endpoints 
of the support is shown in fig [31 



IV. CONCLUSIONS 



Equation (j61|) generalizes the standard result for the eigenvalue density of large-dimensional empirical covariance 
matrices to the case of simultaneous "vertical" and "horizontal" covariances. This set-up, however simplified, does 
capture many real-life problems. For example, the "vertical" covariances can be interpreted as between degrees of 
freedom present on financial markets, and the "horizontal" ones as temporal between the measured samples. Equation 
(|6ip provides an elegant solution to the task of estimation of the covariance matrix from historical financial time series, 
in a variety of ways (Pearson, time-delayed, weighted) and under diverse circumstances (with temporal correlations 
between the volatilities or between the residual returns, with nontrivial inter-asset correlations). Furthermore, our 
method has also a natural interpretation in terms of information theory, for the multiple-input-multiple-output 
(MIMO) systems in wireless telecommunication, where the "vertical" correlations are in the input and the "horizontal" 
ones in the output. Being very general and natural, this description is certainly extendable to other problems in more 
than few areas of science. 

We attempted to introduce to the reader, in a pedagogical and practical fashion, a powerful machinery of the free 
random variables calculus, which is a non-commutative version of classical probability theory. It ushers in tools, based 
on the notion of freeness (which is non-commutative independence), which are fit to handle, in a purely algebraic and 



25 




2 4 6 8 10 

FIG. 3: The endpoints of the support of the mean spectrum as a function of /3, for r = 0.3, 0.8. 

remarkably simple way, the setting of multivariate data displaying both mentioned types of covariances, without any 
recourse to other better known techniques of random matrix theory. Not only so, but there exist very straightforward 
paths to generalize FRV to situations where many of the standard RMT methods are limited, such as of heavy-tailed 
multivariate distributions. 

We hope that this paper will have a sizable impact on the quantitative finance community, communicating the 
potential which lies in FRV and encouraging its applications to modeling and analyzing of financial data under involved 
conditions encountered in real- world problems. 
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